AITopics | belief-dependent reward

Collaborating Authors

belief-dependent reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anytime Incremental $\rho$POMDP Planning in Continuous Spaces

Benchetrit, Ron, Lev-Yehudi, Idan, Zhitnikov, Andrey, Indelman, Vadim

arXiv.org Artificial IntelligenceFeb-4-2025

Partially Observable Markov Decision Processes (POMDPs) provide a robust framework for decision-making under uncertainty in applications such as autonomous driving and robotic exploration. Their extension, $\rho$POMDPs, introduces belief-dependent rewards, enabling explicit reasoning about uncertainty. Existing online $\rho$POMDP solvers for continuous spaces rely on fixed belief representations, limiting adaptability and refinement - critical for tasks such as information-gathering. We present $\rho$POMCPOW, an anytime solver that dynamically refines belief representations, with formal guarantees of improvement over time. To mitigate the high computational cost of updating belief-dependent rewards, we propose a novel incremental computation approach. We demonstrate its effectiveness for common entropy estimators, reducing computational cost by orders of magnitude. Experimental results show that $\rho$POMCPOW outperforms state-of-the-art solvers in both efficiency and solution quality.

artificial intelligence, belief revision, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.02549

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Aragón > Zaragoza Province > Zaragoza (0.04)
Europe > Czechia > Prague (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.93)

Add feedback

A POMDP Extension with Belief-dependent Rewards

Neural Information Processing SystemsApr-6-2023, 13:27:26 GMT

Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce rho-POMDPs, an extension of POMDPs where the reward function rho depends on the belief state. We show that, under the common assumption that rho is convex, the value function is also convex, what makes it possible to (1) approximate rho arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.

belief-dependent reward, convex, pomdp extension

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Dressel

AAAI ConferencesFeb-8-2022, 11:29:26 GMT

Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP -- a state-of-the-art offline solver -- to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of information-gathering tasks. These solvers are applied to control a hexcopter UAV searching for a radio frequency source--a challenging real-world problem.

belief-dependent reward, information-gathering task, solver, (2 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Monte Carlo Information-Oriented Planning

Thomas, Vincent, Hutin, Gérémy, Buffet, Olivier

arXiv.org Artificial IntelligenceMar-21-2021

In this article, we discuss how to solve information-gathering problems expressed as rho-POMDPs, an extension of Partially Observable Markov Decision Processes (POMDPs) whose reward rho depends on the belief state. Point-based approaches used for solving POMDPs have been extended to solving rho-POMDPs as belief MDPs when its reward rho is convex in B or when it is Lipschitz-continuous. In the present paper, we build on the POMCP algorithm to propose a Monte Carlo Tree Search for rho-POMDPs, aiming for an efficient on-line planner which can be used for any rho function. Adaptations are required due to the belief-dependent rewards to (i) propagate more than one state at a time, and (ii) prevent biases in value estimates. An asymptotic convergence proof to epsilon-optimal values is given when rho is continuous. Experiments are conducted to analyze the algorithms at hand and show that they outperform myopic approaches.

algorithm, node, pomdp, (17 more...)

arXiv.org Artificial Intelligence

2103.11345

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A POMDP Extension with Belief-dependent Rewards

Araya, Mauricio, Buffet, Olivier, Thomas, Vincent, Charpillet, Françcois

Neural Information Processing SystemsFeb-15-2020, 00:41:08 GMT

belief-dependent reward, convex, pomdp extension

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Efficient Decision-Theoretic Target Localization

Dressel, Louis (Stanford University) | Kochenderfer, Mykel J. (Stanford University)

AAAI ConferencesJun-14-2017

Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP--a state-of-the-art offline solver--to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of information-gathering tasks. These solvers are applied to control a hex-copter UA V searching for a radio frequency source--a challenging real-world problem.

belief-dependent reward, pomdp-lite, solver, (15 more...)

AAAI Conferences

Twenty-Seventh International Conference on Automated Planning and Scheduling

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Aerospace & Defense (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback